智能论文笔记

A Heuristic-driven Uncertainty based Ensemble Framework for Fake News Detection in Tweets and News Articles

Sourya Dipta Das , Ayan Basak , Saikat Dutta

分类：自然语言处理 | 人工智能

2021-04-05

社交媒体的重要性在过去几十年中增加了流畅，因为它帮助人们甚至是世界上最偏远的角落保持联系。随着技术的出现，数字媒体比以往任何时候都变得更加相关和广泛使用，并且在此之后，假冒新闻和推文的流通中有一种复兴，需要立即关注。在本文中，我们描述了一种新的假新闻检测系统，可自动识别新闻项目是“真实的”或“假”，作为我们在英语挑战中的约束Covid-19假新闻检测中的工作的延伸。我们使用了一个由预先训练的模型组成的集合模型，然后是统计特征融合网络，以及通过在新闻项目或推文中的各种属性，如源，用户名处理，URL域和作者中的各种属性结合到统计特征中的各种属性。我们所提出的框架还规定了可靠的预测性不确定性以及分类任务的适当类别输出置信水平。我们在Covid-19假新闻数据集和Fakenewsnet数据集上评估了我们的结果，以显示所提出的算法在短期内容中检测假新闻以及新闻文章中的算法。我们在Covid-19数据集中获得了0.9892的最佳F1分，以及Fakenewsnet数据集的F1分数为0.9073。

translated by 谷歌翻译

Observability-aware online multi-lidar extrinsic calibration

Sandipan Das , Ludvig af Klinteberg , Maurice Fallon , Saikat Chatterjee

分类：机器人

2022-12-19

Accurate and robust extrinsic calibration is necessary for deploying autonomous systems which need multiple sensors for perception. In this paper, we present a robust system for real-time extrinsic calibration of multiple lidars in vehicle base frame without the need for any fiducial markers or features. We base our approach on matching absolute GNSS and estimated lidar poses in real-time. Comparing rotation components allows us to improve the robustness of the solution than traditional least-square approach comparing translation components only. Additionally, instead of comparing all corresponding poses, we select poses comprising maximum mutual information based on our novel observability criteria. This allows us to identify a subset of the poses helpful for real-time calibration. We also provide stopping criteria for ensuring calibration completion. To validate our approach extensive tests were carried out on data collected using Scania test vehicles (7 sequences for a total of ~ 6.5 Km). The results presented in this paper show that our approach is able to accurately determine the extrinsic calibration for various combinations of sensor setups.

translated by 谷歌翻译

Guaranteed Conformance of Neurosymbolic Models to Natural Constraints

Kaustubh Sridhar , Souradeep Dutta , James Weimer , Insup Lee

分类：机器学习 | 人工智能 | 机器人

2022-12-02

Deep neural networks have emerged as the workhorse for a large section of robotics and control applications, especially as models for dynamical systems. Such data-driven models are in turn used for designing and verifying autonomous systems. This is particularly useful in modeling medical systems where data can be leveraged to individualize treatment. In safety-critical applications, it is important that the data-driven model is conformant to established knowledge from the natural sciences. Such knowledge is often available or can often be distilled into a (possibly black-box) model $M$. For instance, the unicycle model for an F1 racing car. In this light, we consider the following problem - given a model $M$ and state transition dataset, we wish to best approximate the system model while being bounded distance away from $M$. We propose a method to guarantee this conformance. Our first step is to distill the dataset into few representative samples called memories, using the idea of a growing neural gas. Next, using these memories we partition the state space into disjoint subsets and compute bounds that should be respected by the neural network, when the input is drawn from a particular subset. This serves as a symbolic wrapper for guaranteed conformance. We argue theoretically that this only leads to bounded increase in approximation error; which can be controlled by increasing the number of memories. We experimentally show that on three case studies (Car Model, Drones, and Artificial Pancreas), our constrained neurosymbolic models conform to specified $M$ models (each encoding various constraints) with order-of-magnitude improvements compared to the augmented Lagrangian and vanilla training methods.

translated by 谷歌翻译

Controlling Commercial Cooling Systems Using Reinforcement Learning

Jerry Luo , Cosmin Paduraru , Octavian Voicu , Yuri Chervonyi , Scott Munns , Jerry Li , Crystal Qian , Praneet Dutta , Jared Quincy Davis , Ningjia Wu

分类：机器学习 | 人工智能

2022-11-11

This paper is a technical overview of DeepMind and Google's recent work on reinforcement learning for controlling commercial cooling systems. Building on expertise that began with cooling Google's data centers more efficiently, we recently conducted live experiments on two real-world facilities in partnership with Trane Technologies, a building management system provider. These live experiments had a variety of challenges in areas such as evaluation, learning from offline data, and constraint satisfaction. Our paper describes these challenges in the hope that awareness of them will benefit future applied RL work. We also describe the way we adapted our RL system to deal with these challenges, resulting in energy savings of approximately 9% and 13% respectively at the two live experiment sites.

translated by 谷歌翻译

AX-MABSA: A Framework for Extremely Weakly Supervised Multi-label Aspect Based Sentiment Analysis

Sabyasachi Kamila , Walid Magdy , Sourav Dutta , MingXue Wang

分类：自然语言处理

2022-11-07

Aspect Based Sentiment Analysis is a dominant research area with potential applications in social media analytics, business, finance, and health. Prior works in this area are primarily based on supervised methods, with a few techniques using weak supervision limited to predicting a single aspect category per review sentence. In this paper, we present an extremely weakly supervised multi-label Aspect Category Sentiment Analysis framework which does not use any labelled data. We only rely on a single word per class as an initial indicative information. We further propose an automatic word selection technique to choose these seed categories and sentiment words. We explore unsupervised language model post-training to improve the overall performance, and propose a multi-label generator model to generate multiple aspect category-sentiment pairs per review sentence. Experiments conducted on four benchmark datasets showcase our method to outperform other weakly supervised baselines by a significant margin.

translated by 谷歌翻译

Issues and Challenges in Applications of Artificial Intelligence to Nuclear Medicine -- The Bethesda Report (AI Summit 2022)

Arman Rahmim , Tyler J. Bradshaw , Irène Buvat , Joyita Dutta , Abhinav K. Jha , Paul E. Kinahan , Quanzheng Li , Chi Liu , Melissa D. McCradden , Babak Saboury

分类：人工智能

2022-11-07

The SNMMI Artificial Intelligence (SNMMI-AI) Summit, organized by the SNMMI AI Task Force, took place in Bethesda, MD on March 21-22, 2022. It brought together various community members and stakeholders from academia, healthcare, industry, patient representatives, and government (NIH, FDA), and considered various key themes to envision and facilitate a bright future for routine, trustworthy use of AI in nuclear medicine. In what follows, essential issues, challenges, controversies and findings emphasized in the meeting are summarized.

translated by 谷歌翻译

Can Querying for Bias Leak Protected Attributes? Achieving Privacy With Smooth Sensitivity

Faisal Hamman , Jiahao Chen , Sanghamitra Dutta

分类：人工智能 | 机器学习

2022-11-03

Existing regulations prohibit model developers from accessing protected attributes (gender, race, etc.), often resulting in fairness assessments on populations without knowing their protected groups. In such scenarios, institutions often adopt a separation between the model developers (who train models with no access to the protected attributes) and a compliance team (who may have access to the entire dataset for auditing purpose). However, the model developers might be allowed to test their models for bias by querying the compliance team for group fairness metrics. In this paper, we first demonstrate that simply querying for fairness metrics, such as statistical parity and equalized odds can leak the protected attributes of individuals to the model developers. We demonstrate that there always exist strategies by which the model developers can identify the protected attribute of a targeted individual in the test dataset from just a single query. In particular, we show that one can reconstruct the protected attributes of all the individuals from O(Nk log n/Nk) queries when Nk<<n using techniques from compressed sensing (n: size of the test dataset, Nk: size of smallest group). Our results pose an interesting debate in algorithmic fairness: should querying for fairness metrics be viewed as a neutral-valued solution to ensure compliance with regulations? Or, does it constitute a violation of regulations and privacy if the number of queries answered is enough for the model developers to identify the protected attributes of specific individuals? To address this supposed violation, we also propose Attribute-Conceal, a novel technique that achieves differential privacy by calibrating noise to the smooth sensitivity of our bias query, outperforming naive techniques such as Laplace mechanism. We also include experimental results on the Adult dataset and synthetic data (broad range of parameters).

translated by 谷歌翻译

Optimizing Industrial HVAC Systems with Hierarchical Reinforcement Learning

William Wong , Praneet Dutta , Octavian Voicu , Yuri Chervonyi , Cosmin Paduraru , Jerry Luo

分类：机器学习 | 人工智能 | 机器人

2022-09-16

已经开发了增强学习（RL）技术来优化工业冷却系统，与传统的启发式政策相比，提供了可观的节能。工业控制中的一个主要挑战涉及由于机械限制而在现实世界中可行的学习行为。例如，某些操作只能每隔几个小时执行一次，而其他动作可以更频繁地采取。如果没有广泛的奖励工程和实验，RL代理可能无法学习机械的现实操作。为了解决这个问题，我们使用层次结构的增强学习与多种根据操作时间尺度控制动作子集的代理。我们的分层方法可以在现有基线上节省能源，同时在模拟的HVAC控制环境中保持在安全范围内的限制（例如操作冷却器）。

translated by 谷歌翻译

Personalized Federated Learning with Communication Compression

El Houcine Bergou , Konstantin Burlachenko , Aritra Dutta , Peter Richtárik

分类：机器学习 | 人工智能

2022-09-12

与训练数据中心的训练传统机器学习（ML）模型相反，联合学习（FL）训练ML模型，这些模型在资源受限的异质边缘设备上包含的本地数据集上。现有的FL算法旨在为所有参与的设备学习一个单一的全球模型，这对于所有参与培训的设备可能没有帮助，这是由于整个设备的数据的异质性。最近，Hanzely和Richt \'{A} Rik（2020）提出了一种新的配方，以培训个性化的FL模型，旨在平衡传统的全球模型与本地模型之间的权衡，该模型可以使用其私人数据对单个设备进行培训只要。他们得出了一种称为无环梯度下降（L2GD）的新算法，以解决该算法，并表明该算法会在需要更多个性化的情况下，可以改善沟通复杂性。在本文中，我们为其L2GD算法配备了双向压缩机制，以进一步减少本地设备和服务器之间的通信瓶颈。与FL设置中使用的其他基于压缩的算法不同，我们的压缩L2GD算法在概率通信协议上运行，在概率通信协议中，通信不会按固定的时间表进行。此外，我们的压缩L2GD算法在没有压缩的情况下保持与香草SGD相似的收敛速率。为了验证算法的效率，我们在凸和非凸问题上都进行了多种数值实验，并使用各种压缩技术。

translated by 谷歌翻译

Suppressing Noise from Built Environment Datasets to Reduce Communication Rounds for Convergence of Federated Learning

Rahul Mishra , Hari Prabhat Gupta , Tanima Dutta , Sajal K. Das

分类：机器学习

2022-09-03

Smart Sensing提供了一种更轻松，方便的数据驱动机制，用于在建筑环境中监视和控制。建筑环境中生成的数据对隐私敏感且有限。 Federated Learning是一个新兴的范式，可在多个参与者之间提供隐私的合作，以进行模型培训，而无需共享私人和有限的数据。参与者数据集中的嘈杂标签降低了表现，并增加了联合学习收敛的通信巡回赛数量。如此大的沟通回合需要更多的时间和精力来训练模型。在本文中，我们提出了一种联合学习方法，以抑制每个参与者数据集中嘈杂标签的不平等分布。该方法首先估计每个参与者数据集的噪声比，并使用服务器数据集将噪声比归一化。所提出的方法可以处理服务器数据集中的偏差，并最大程度地减少其对参与者数据集的影响。接下来，我们使用每个参与者的归一化噪声比和影响来计算参与者的最佳加权贡献。我们进一步得出表达式，以估计提出方法收敛所需的通信回合数。最后，实验结果证明了拟议方法对现有技术的有效性，从交流回合和在建筑环境中实现了性能。

translated by 谷歌翻译